Improving SNP discovery by base alignment quality

نویسنده

  • Heng Li
چکیده

UNLABELLED I propose a new application of profile Hidden Markov Models in the area of SNP discovery from resequencing data, to greatly reduce false SNP calls caused by misalignments around insertions and deletions (indels). The central concept is per-Base Alignment Quality, which accurately measures the probability of a read base being wrongly aligned. The effectiveness of BAQ has been positively confirmed on large datasets by the 1000 Genomes Project analysis subgroup. AVAILABILITY http://samtools.sourceforge.net CONTACT [email protected].

منابع مشابه

SNP and mutation discovery using base-specific cleavage and MALDI-TOF mass spectrometry

MOTIVATION Single Nucleotide Polymorphisms (SNPs) are believed to contribute strongly to the genetic variability in living beings, in particular their disease or drug side effect predispositions. Mutation-induced sequence variations are playing an important role in the development of cancer, among others. From this, it is clear that SNP and mutation discovery is of great interest in today's Lif...

متن کامل

Slider—maximum use of probability information for alignment of short sequence reads and SNP detection

MOTIVATION A plethora of alignment tools have been created that are designed to best fit different types of alignment conditions. While some of these are made for aligning Illumina Sequence Analyzer reads, none of these are fully utilizing its probability (prb) output. In this article, we will introduce a new alignment approach (Slider) that reduces the alignment problem space by utilizing each...

متن کامل

Performance comparison of SNP detection tools with illumina exome sequencing data—an assessment using both family pedigree information and sample-matched SNP array data

To apply exome-seq-derived variants in the clinical setting, there is an urgent need to identify the best variant caller(s) from a large collection of available options. We have used an Illumina exome-seq dataset as a benchmark, with two validation scenarios--family pedigree information and SNP array data for the same samples, permitting global high-throughput cross-validation, to evaluate the ...

متن کامل

Coval: Improving Alignment Quality and Variant Calling Accuracy for Next-Generation Sequencing Data

Accurate identification of DNA polymorphisms using next-generation sequencing technology is challenging because of a high rate of sequencing error and incorrect mapping of reads to reference genomes. Currently available short read aligners and DNA variant callers suffer from these problems. We developed the Coval software to improve the quality of short read alignments. Coval is designed to min...

متن کامل

Using False Discovery Rates to Benchmark SNP-callers in next-generation sequencing projects

Sequence alignments form the basis for many comparative and population genomic studies. Alignment tools provide a range of accuracies dependent on the divergence between the sequences and the alignment methods. Despite widespread use, there is no standard method for assessing the accuracy of a dataset and alignment strategy after resequencing. We present a framework and tool for determining the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:
  • Bioinformatics

دوره 27 8  شماره 

صفحات  -

تاریخ انتشار 2011